Goto

Collaborating Authors

 pairwise independence


The informativeness of the gradient revisited

arXiv.org Artificial Intelligence

In the past decade gradient-based deep learning has revolutionized several applications. However, this rapid advancement has highlighted the need for a deeper theoretical understanding of its limitations. Research has shown that, in many practical learning tasks, the information contained in the gradient is so minimal that gradient-based methods require an exceedingly large number of iterations to achieve success. The informativeness of the gradient is typically measured by its variance with respect to the random selection of a target function from a hypothesis class. We use this framework and give a general bound on the variance in terms of a parameter related to the pairwise independence of the target function class and the collision entropy of the input distribution. Our bound scales as $ \tilde{\mathcal{O}}(\varepsilon+e^{-\frac{1}{2}\mathcal{E}_c}) $, where $ \tilde{\mathcal{O}} $ hides factors related to the regularity of the learning model and the loss function, $ \varepsilon $ measures the pairwise independence of the target function class and $\mathcal{E}_c$ is the collision entropy of the input distribution. To demonstrate the practical utility of our bound, we apply it to the class of Learning with Errors (LWE) mappings and high-frequency functions. In addition to the theoretical analysis, we present experiments to understand better the nature of recent deep learning-based attacks on LWE.


Submodularity, pairwise independence and correlation gap

arXiv.org Artificial Intelligence

In this paper, we provide a characterization of the expected value of monotone submodular set functions with $n$ pairwise independent random inputs. Inspired by the notion of ``correlation gap'', we study the ratio of the maximum expected value of a function with arbitrary dependence among the random inputs with given marginal probabilities to the maximum expected value of the function with pairwise independent random inputs and the same marginal probabilities. Our results show that the ratio is upper bounded by: (a) $4/3$ for $n = 3$ with general marginal probabilities and any monotone submodular set function (b) $4/3$ for general $n$ with small and large marginal probabilities and any monotone submodular set function and (c) $4k/(4k-1)$ for general $n$, general identical probabilities and rank functions of $k$-uniform matroids. The bound is tight in all three cases. This contrasts with the $e/(e-1)$ bound on the correlation gap ratio for monotone submodular set functions with mutually independent random inputs (which is known to be tight in case (b)), and illustrates a fundamental difference in the behavior of submodular functions with weaker notions of independence. These results can be immediately extended beyond pairwise independence to correlated random inputs. We discuss applications in distributionally robust optimization and mechanism design and end the paper with a conjecture.


Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

arXiv.org Artificial Intelligence

Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that VCReg enforces pairwise independence between the features of the learned representation. This result emerges by bridging VCReg applied on the projector's output to kernel independence criteria applied on the projector's input. This provides the first theoretical motivations and explanations of VCReg. We empirically validate our findings where (i) we observe that SSL methods employing VCReg learn visual representations with greater pairwise independence than other methods, (i) we put in evidence which projector's characteristics favor pairwise independence, and show it to emerge independently from learning the projector, (ii) we use these findings to obtain nontrivial performance gains for VICReg, (iii) we demonstrate that the scope of VCReg goes beyond SSL by using it to solve Independent Component Analysis. We hope that our findings will support the adoption of VCReg in SSL and beyond.


Marginal Distance and Hilbert-Schmidt Covariances-Based Independence Tests for Multivariate Functional Data

Journal of Artificial Intelligence Research

We study the pairwise and mutual independence testing problem for multivariate functional data. Using a basis representation of functional data, we reduce this problem to testing the independence of multivariate data, which may be high-dimensional. For pairwise independence, we apply tests based on distance and Hilbert-Schmidt covariances as well as their marginal versions, which aggregate these covariances for coordinates of random processes. In the case of mutual independence, we study asymmetric and symmetric aggregating measures of pairwise dependence. A theoretical justification of the test procedures is established. In extensive simulation studies and examples based on a real economic data set, we investigate and compare the performance of the tests in terms of size control and power. An important finding is that tests based on distance and Hilbert-Schmidt covariances are usually more powerful than their marginal versions under linear dependence, while the reverse is true under non-linear dependence.


A Theory of Uncertainty Variables for State Estimation and Inference

arXiv.org Machine Learning

While it provides a good foundation to system modeling, analysis, and an understanding of the real world, its application to algorithm design suffers from computational intractability. In this work, we develop a new framework of uncertainty variables to model uncertainty. A simple uncertainty variable is characterized by an uncertainty set, in which its realization is bound to lie, while the conditional uncertainty is characterized by a set map, from a given realization of a variable to a set of possible realizations of another variable. We prove Bayes' law and the law of total probability equivalents for uncertainty variables. We define a notion of independence, conditional independence, and pairwise independence for a collection of uncertainty variables, and show that this new notion of independence preserves the properties of independence defined over random variables. We then develop a graphical model, namely Bayesian uncertainty network, a Bayesian network equivalent defined over a collection of uncertainty variables, and show that all the natural conditional independence properties, expected out of a Bayesian network, hold for the Bayesian uncertainty network. We also define the notion of point estimate, and show its relation with the maximum a posteriori estimate.


Pairwise Decomposition for Combinatorial Optimization in Graphical Models

AAAI Conferences

We propose a new additive decomposition of probability tables that preserves equivalence of the joint distribution while reducing the size of potentials, without extra variables. We formulate the Most Probable Explanation (MPE) problem in belief networks as a Weighted Constraint Satisfaction Problem (WCSP). Our pairwise decomposition allows to replace a cost function with smaller-arity functions. The resulting pairwise decomposed WCSP is then easier to solve using state-of-the-art WCSP techniques. Although testing pairwise decomposition is equivalent to testing pairwise independence in the original belief network, we show how to efficiently test and enforce it, even in the presence of hard constraints. Furthermore, we infer additional information from the resulting nonbinary cost functions by projecting and subtracting them on binary functions. We observed huge improvements by preprocessing with pairwise decomposition and project&subtract compared to the current state-of-the-art solvers on two difficult sets of benchmark.


Near-Uniform Sampling of Combinatorial Spaces Using XOR Constraints

Neural Information Processing Systems

We propose a new technique for sampling the solutions of combinatorial problems in a near-uniform manner. We focus on problems specified as a Boolean formula, i.e., on SAT instances. Sampling for SAT problems has been shown to have interesting connections with probabilistic reasoning, making practical sampling algorithms for SAT highly desirable. The best current approaches are based on Markov Chain Monte Carlo methods, which have some practical limitations.


Near-Uniform Sampling of Combinatorial Spaces Using XOR Constraints

Neural Information Processing Systems

We propose a new technique for sampling the solutions of combinatorial problems in a near-uniform manner. We focus on problems specified as a Boolean formula, i.e., on SAT instances. Sampling for SAT problems has been shown to have interesting connections with probabilistic reasoning, making practical sampling algorithms for SAT highly desirable. The best current approaches are based on Markov Chain Monte Carlo methods, which have some practical limitations.


Near-Uniform Sampling of Combinatorial Spaces Using XOR Constraints

Neural Information Processing Systems

We propose a new technique for sampling the solutions of combinatorial problems ina near-uniform manner. We focus on problems specified as a Boolean formula, i.e.,on SAT instances. Sampling for SAT problems has been shown to have interesting connections with probabilistic reasoning, making practical sampling algorithms for SAT highly desirable. The best current approaches are based on Markov Chain Monte Carlo methods, which have some practical limitations. Our approach exploits combinatorial properties of random parity (XOR) constraints to prune away solutions near-uniformly. The final sample is identified amongst the remaining ones using a state-of-the-art SAT solver.